Include the names of your collaborators here.
Vishruth Reddy
This homework is dedicated to working with logistic regression for
binary classification. You will explore a binary classification
application and fit non-Bayesian logistic regression models by
maximizing the likelihood via the glm() function R. Then
you will fit Bayesian logistic regression models with the Laplace
Approximation by programming the log-posterior function. You will
identify the best model and make predictions to study the trends in the
predicted event probability. You will conclude the application by tuning
a non-Bayesian logistic regression model with the elastic net penalty to
identify to help identify the most important features in the model. You
will visualize the predicted event probability trends from the tuned
elastic net model and compare the results with your Bayesian models.
You will use the tidyverse, coefplot,
broom, MASS, glmnet, and
caret packages in this assignment.
IMPORTANT: The RMarkdown assumes you have downloaded the data sets (CSV files) to the same directory you saved the template Rmarkdown file. If you do not have the CSV files in the correct location, the data will not be loaded correctly.
Certain code chunks are created for you. Each code chunk has
eval=FALSE set in the chunk options. You
MUST change it to be eval=TRUE in order
for the code chunks to be evaluated when rendering the document.
You are free to add more code chunks if you would like.
This assignment will use packages from the tidyverse
suite.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
This assignment also uses the broom package. The
broom package is part of tidyverse and so you
have it installed already. The broom package will be used
via the :: operator later in the assignment and so you do
not need to load it directly.
The caret package will be loaded later in the assignment
and the glmnet package will be loaded via
caret.
The primary data set you will work with in this assignment is loaded for you in the code chunk below.
df1 <- readr::read_csv('hw10_binary_01.csv', col_names = TRUE)
## Rows: 225 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): x1
## dbl (3): x2, x3, y
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The data consists of 3 inputs, x1, x2, and
x3, and one binary outcome, y. The glimpse
below shows that x1 is a character and thus is a
categorical input. The x2 and x3 inputs are
continuous. The binary outcome has been encoded as y=1 for
the EVENT and y=0 for the NON-EVENT.
df1 %>% glimpse()
## Rows: 225
## Columns: 4
## $ x1 <chr> "C", "B", "C", "B", "A", "C", "A", "A", "C", "C", "B", "C", "A", "C…
## $ x2 <dbl> 1.682873632, -1.033648456, 0.110854156, 2.032934019, -0.225540507, …
## $ x3 <dbl> -0.353085685, -0.778102544, 0.757536960, 0.639465847, 0.017483448, …
## $ y <dbl> 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0…
It is always best to explore data before training predictive models. This assignment does not require you to create all figures necessary to sufficiently explore the data. This assignment focuses on various ways of exploring the relationships between the binary outcome and the inputs. You will thus not consider input correlation plots in this assignment. Please note that the inputs have already been standardized for you to streamline the visualizations and modeling. Realistic applications like the final project may have inputs with vastly different scales and so you will need to execute the preprocessing as part of the model training.
The code chunk below reshapes the wide-format df1 data
into long-format, lf1. The continuous inputs,
x1 and x2, are “gathered” and “stacked” on top
of each other. The long-format data supports using facets associated
with the continuous inputs. You will use the long-format data in some of
the visualizations below.
lf1 <- df1 %>%
tibble::rowid_to_column() %>%
pivot_longer(c(x2, x3))
lf1
## # A tibble: 450 × 5
## rowid x1 y name value
## <int> <chr> <dbl> <chr> <dbl>
## 1 1 C 1 x2 1.68
## 2 1 C 1 x3 -0.353
## 3 2 B 0 x2 -1.03
## 4 2 B 0 x3 -0.778
## 5 3 C 0 x2 0.111
## 6 3 C 0 x3 0.758
## 7 4 B 0 x2 2.03
## 8 4 B 0 x3 0.639
## 9 5 A 0 x2 -0.226
## 10 5 A 0 x3 0.0175
## # … with 440 more rows
The glimpse below shows the continuous input names are contained in
the name column and their values are contained in the
value column.
lf1 %>% glimpse()
## Rows: 450
## Columns: 5
## $ rowid <int> 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11…
## $ x1 <chr> "C", "C", "B", "B", "C", "C", "B", "B", "A", "A", "C", "C", "A",…
## $ y <dbl> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0…
## $ name <chr> "x2", "x3", "x2", "x3", "x2", "x3", "x2", "x3", "x2", "x3", "x2"…
## $ value <dbl> 1.682873632, -0.353085685, -1.033648456, -0.778102544, 0.1108541…
Let’s start with exploring the inputs.
Visualize the distributions of the continuous inputs as
histograms in ggplot2.
It is up to you as to whether you use the wide format or long-format data. If you use the wide-format data you should create two separate figures. If you use the long-format data you should use facets for each continuous input.
Add your code chunks here.
lf1 %>% ggplot(mapping = aes(x = x1)) +
geom_bar() +
facet_wrap(~name)
Visualize the counts of the categorical input x1
with a bar chart in ggplot2. You MUST use the wide-format
data for this visualization.
Add your code chunks here.
df1 %>% ggplot(mapping = aes(x = x1)) + geom_bar()
Let’s examine if there are differences in the continuous input distributions based on the categorical input. You will use boxplots to focus on the distribution summary statistics.
You must use the long-format data for this figure. Create a
boxplot in ggplot2 where the x aesthetic is
the categorical input and the y aesthetic is the
value column. You must include facets associated with the
name column.
Add your code chunks here.
lf1 %>% ggplot(mapping = aes(x = x1, y = value)) + geom_boxplot() + facet_wrap(~name)
Let’s now focus on the binary outcome.
Visualize the counts of the binary outcome y
with a bar chart in ggplot2. You MUST use the wide-format
data for this visualization.
Is the binary outcome balanced?
Add your code chunks here.
What do you think?
No it is not balanced. there are more instances with ‘0’ as compared to ‘1’.
df1 %>% ggplot(mapping = aes(x = y)) + geom_bar()
Let’s see if the categorical input impacts the binary outcome.
Create a bar chart for the categorical input x1
with ggplot2 like you did in 1b). However, you must also
map the fill aesthetic to
as.factor(y).
The data type conversion function as.factor() can be
applied in-line. This will force a categorical fill palette to be used
rather than a continuous palette.
Add your code chunks here.
df1 %>% ggplot(mapping = aes(x = x1, fill = as.factor(y))) + geom_bar()
Let’s now visualize the binary outcome with respect to the continuous
inputs. You will use the geom_jitter() function instead of
the geom_point() function to create the scatter plot. The
geom_jitter() function adds small amounts of random noise
to “jitter” or perturb the locations of the points. This will make it
easier to see the individual observations of the binary outcome. You
MUST use the long-format data for this question.
Pipe the long-format data to ggplot() and map
the x aesthetic to the value variable and the
y aesthetic to the y variable. Add a
geom_jitter() layer where you specify the
height and width arguments to be 0.02 and 0,
respectively. Do NOT set height and width
within the aes() function. Facet by the continuous inputs
by including a facet_wrap() layer with the facets “a
function of” the name column.
Add your code chunks here.
lf1 %>% ggplot(mapping = aes(x = value, y = y)) + geom_jitter(height = 0.02, width = 0) + geom_point() + facet_wrap(~name)
We can include a logistic regression smoother to help visualize the
changes in the event probability. You will use the
geom_smooth() function to do so, but you will need to
change the arguments compared to previous assignments that focused on
regression.
Create the same plot as 1f) but include
geom_smooth() as a layer between geom_jitter()
and facet_wrap(). Specify the formula argument
to y ~ x, the method argument to be
glm, and the method.args argument to be
list(family = 'binomial').
The formula argument are “local” variables associated
with the aesthetics. Thus the formula y ~ x means the
y aesthetic is linearly related to the x
aesthetic. However, by specifying method = glm and
method.args = list(family = 'binomial') you are instructing
geom_smooth() to fit a logistic regression model. Thus, you
are actually specifying that the linear predictor, the log-odds
ratio is linearly related to the x aesthetic.
Add your code chunks here.
lf1 %>% ggplot(mapping = aes(x = value, y = y)) + geom_point() + geom_jitter(height = 0.02, width = 0) + geom_smooth(formula = y ~ x, method = glm, method.args = list(family = 'binomial')) + facet_wrap(~name)
Let’s check if the categorical input influences the event probability trends with respect to the continuous inputs.
Create the same figure as in 1g), except this time map the
color and fill aesthetics within the
geom_smooth() layer to the x1 variable. You
must also map the color aesthetic within the
geom_jitter() layer to the x1
variable.
Based on your figure do the trends appear to depend on the categorical input?
Add your code chunks here.
What do you think?
Yes the trends seems to depend on the categorical input. The trends change for different ategories A, B and C as denoted by the different colors in the figure.
lf1 %>% ggplot(mapping = aes(x = value, y = y)) + geom_point() + geom_jitter(height = 0.02, width = 0, mapping = aes( color = x1)) + geom_smooth(formula = y ~ x, method = glm, method.args = list(family = 'binomial'), mapping = aes(color = x1, fill = x1)) + facet_wrap(~name)
The previous figures used the “basic” formula of y ~ x
within geom_smooth(). However, we can try more complex
relationships within geom_smooth(). For example, let’s
consider quadratic relationships between the log-odds ratio (linear
predictor) and the continuous inputs.
Create the same figure as 1h), except this time specify the
formula argument to be y ~ x + I(x^2). Use the
same set of aesthetics as 1h) including the color and
fill aesthetics.
Does the quadratic relationship appear to be consistent with the data for either of the 2 inputs?
Add your code chunks here.
What do you think?
Yes they seem to be consistent.
lf1 %>% ggplot(mapping = aes(x = value, y = y)) + geom_point() + geom_jitter(height = 0.02, width = 0, mapping = aes( color = x1)) + geom_smooth(formula = y ~ x + I(x^2), method = glm, method.args = list(family = 'binomial'), mapping = aes(color = x1, fill = x1)) + facet_wrap(~name)
Now that you have explored the data, it’s time to start modeling! You
will fit multiple non-Bayesian logistic regression models using
glm(). These models will range from simple to complex. You
do not need to standardize the continuous inputs, they have already been
standardized for you. You can focus on the fitting the models.
BE CAREFUL!! Make sure you set the
family argument in glm() correctly!!! The
family argument was discussed earlier in the semester.
You must fit the following models:
A: Categorical input only
B: Linear additive features using the continuous inputs only
C: Linear additive features using all inputs (categorical and
continuous)
D: Interact the categorical input with the continuous inputs
E: Add the categorical input to linear main effects and interaction
between the continuous inputs
F: Interact the categorical input with main effect and interaction
features associated with the continuous inputs
G: Add the categorical input to the linear main continuous effects,
interaction between continuous, and quadratic continuous features
H: Interact the categorical input to the linear main continuous effects,
interaction between continuous, and quadratic continuous features
You must name your models modA through
modH.
You do not need to fit all models in a single code chunk.
Add your code chunks here.
modA = glm(formula = y ~ x1, family = 'binomial', data = df1)
modB = glm(formula = y ~ x2 + x3, family = 'binomial', data = df1)
modC = glm(formula = y ~ x2 + x3 + x1, family = 'binomial', data = df1)
modD = glm(formula = y ~ x1:x3 + x1:x2, family = 'binomial', data = df1)
modE = glm(formula = y ~ x1 + x2 + x3 + x2:x3, family = 'binomial', data = df1)
modF = glm(formula = y ~ x1:(x2 + x3) + x1:(x2:x3), family = 'binomial', data = df1)
modG = glm(formula = y ~ x1 + x2 + x3 + (x2:x3) +(I(x2^2):I(x3^2)), family = 'binomial', data = df1)
modH = glm(formula = y ~ x1:(x2 + x3) + x1:(x2:x3) + x1:(I(x2^2) : I(x3^2)), family = 'binomial', data = df1)
Which of the 8 models is the best? Which of the 8 models is the second best?
State the performance metric you used to make your selection.
HINT: The broom::glance() function will help
here…
The best model seems to be modG and the performance metric I used is BIC. The second best is modD.
Add your code chunks here.
broom::glance(modA)
## # A tibble: 1 × 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 281. 224 -137. 279. 290. 273. 222 225
broom::glance(modB)
## # A tibble: 1 × 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 281. 224 -132. 270. 281. 264. 222 225
broom::glance(modC)
## # A tibble: 1 × 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 281. 224 -128. 265. 282. 255. 220 225
broom::glance(modD)
## # A tibble: 1 × 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 281. 224 -121. 256. 280. 242. 218 225
broom::glance(modE)
## # A tibble: 1 × 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 281. 224 -128. 267. 288. 255. 219 225
broom::glance(modF)
## # A tibble: 1 × 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 281. 224 -117. 254. 289. 234. 215 225
broom::glance(modG)
## # A tibble: 1 × 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 281. 224 -121. 255. 279. 241. 218 225
broom::glance(modH)
## # A tibble: 1 × 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 281. 224 -105. 236. 281. 210. 212 225
Create the coefficient plot associated with your best and second best models. How many coefficients are statistically significant in each model?
You should use the coefplot::coefplot() function to
create the plots.
Add your code chunks here.
coefplot::coefplot(modG)
coefplot::coefplot(modD)
Now that you have an idea about the relationships, it’s time to consider a more detailed view of the uncertainty by fitting Bayesian logistic regression models. You defined log-posterior functions for linear models in previous assignments. You worked with simple linear relationships, interactions, polynomials, and more complex spline basis features. In lecture, we discussed how the linear model framework can be generalized to handle non-continuous binary outcomes. The likelihood changed from a Gaussian to a Binomial distribution and a non-linear link function is required. In this way, the regression model is applied to a linear predictor which “behaves” like the trend in an ordinary linear model. In this problem, you will define the log-posterior function for logistic regression. By doing so you will be able to directly contrast what you did to define the log-posterior function for the linear model in previous assignments.
The complete probability model for logistic regression consists of the likelihood of the response \(y_n\) given the event probability \(\mu_n\), the inverse link function between the probability of the event, \(\mu_n\), and the linear predictor, \(\eta_n\), and the prior on all linear predictor model coefficients \(\boldsymbol{\beta}\).
As in lecture, you will assume that the \(\boldsymbol{\beta}\)-parameters are a-priori independent Gaussians with a shared prior mean \(\mu_{\beta}\) and a shared prior standard deviation \(\tau_{\beta}\).
Write out complete probability model for logistic regression. You must write out the \(n\)-th observation’s linear predictor using the inner product of the \(n\)-th row of a design matrix \(\mathbf{x}_{n,:}\) and the unknown \(\boldsymbol{\beta}\)-parameter column vector. You can assume that the number of unknown coefficients is equal to \(D + 1\).
You are allowed to separate each equation into its own equation block.
HINT: The “given” sign, the vertical line, \(\mid\), is created by typing
\mid in a latex math expression. The product symbol (the
giant PI) is created with prod_{}^{}.
Add your equation blocks here.
Likelihood for nth observation:
\[ y_{n}\mid\mu_{n} \sim Bernoulli\left(y_{n}\mid\mu_{n}\right)\\ \] Expanding Bernoulli Likelihood (for n observations) and substituting \(\eta_n\): \[ p(y_n \mid \mu_n) = \prod_{n = 1}^{N}{\eta_n^{y_n} (1-\eta_n)^{1-y_n}} \] \(\eta_n\) for n observations can be written as:
\[ \eta_n = \sum_{d = 0}^{D}{\beta_d . X_{n,:d}} = X_{n,:}\beta \] Substituting in Likelihood:
\[ \prod_{n = 1}^{N}{(X_{n,:}\beta)^{y_n} (1-(X_{n,:}\beta))^{1-y_n}} \]
Prior for all \(\beta\): \[ p(\beta) = \prod_{d = 0}^{D}(normal(\beta_d \mid \mu_{\beta}, \tau_{beta}) \]
Also we know: \[ \mu_n = logit^{-1}(\eta_n) \]
Total posterior probability for all observations = Likelihood + Prior \[ => \prod_{n = 1}^{N}{(X_{n,:}\beta)^{y_n} (1-(X_{n,:}\beta))^{1-y_n}} \]
You will fit 8 Bayesian logistic regression models using the same linear predictor trend expressions that you used in the non-Bayesian logistic regression models. You will program the log-posterior function in the same style as the linear model log-posterior functions. This allows you to use the same Laplace Approximation strategy to execute the Bayesian inference.
You must first define the design matrices for each of the 8
models. You must name the design matrices Xmat_A through
Xmat_H. As a reminder, the 8 models are provided
below.
A: Categorical input only
B: Linear additive features using the continuous inputs only
C: Linear additive features using all inputs (categorical and
continuous)
D: Interact the categorical input with the continuous inputs
E: Add the categorical input to linear main effects and interaction
between the continuous inputs
F: Interact the categorical input with main effect and interaction
features associated with the continuous inputs
G: Add the categorical input to the linear main continuous effects,
interaction between continuous, and quadratic continuous features
H: Interact the categorical input to the linear main continuous effects,
interaction between continuous, and quadratic continuous features
Create the design matrices for the 8 models. Add your code chunks below.
Xmat_A = model.matrix(y ~ x1, data = df1)
Xmat_B = model.matrix(y ~ x2 + x3, data = df1)
Xmat_C = model.matrix(y ~ x2 + x3 + x1, data = df1)
Xmat_D = model.matrix(y ~ x1:x3 + x1:x2, data = df1)
Xmat_E = model.matrix(y ~ x1 + x2 + x3 + x2:x3, data = df1)
Xmat_F = model.matrix(y ~ x1:(x2 + x3) + x1:(x2:x3), data = df1)
Xmat_G = model.matrix(y ~ x1 + x2 + x3 + (x2:x3) +(I(x2^2):I(x3^2)), data = df1)
Xmat_H = model.matrix(y ~ x1:(x2 + x3) + x1:(x2:x3) + x1:(I(x2^2) : I(x3^2)), data = df1)
The log-posterior function you will program requires the design
matrix, the observed output vector, and the prior specification. In
previous assignments, you provided that information with a list. You
will do the same thing in this assignment. The code chunk below is
started for you. The lists follow the same naming convention as the
design matrices. The info_A list corresponds to the
information for model A, while info_H corresponds to the
information for model H. You must assign the design matrix to the
corresponding list and complete the rest of the required information.
The observed binary outcome vector must be assigned to the
yobs field. The prior mean and prior standard deviation
must be assigned to the mu_beta and tau_beta
fields, respectively.
Complete the code chunk below by completing the lists of required for each model. The list names are consistent with the design matrix names you defined in the previous problem. You must use a prior mean of 0 and a prior standard deviation of 4.5.
info_A <- list(
yobs = df1$y,
design_matrix = Xmat_A,
mu_beta = 0,
tau_beta = 4.5
)
info_B <- list(
yobs = df1$y,
design_matrix =Xmat_B ,
mu_beta = 0,
tau_beta = 4.5
)
info_C <- list(
yobs =df1$y ,
design_matrix =Xmat_C ,
mu_beta = 0,
tau_beta = 4.5
)
info_D <- list(
yobs = df1$y,
design_matrix = Xmat_D,
mu_beta = 0,
tau_beta = 4.5
)
info_E <- list(
yobs = df1$y,
design_matrix =Xmat_E ,
mu_beta = 0,
tau_beta = 4.5
)
info_F <- list(
yobs = df1$y,
design_matrix =Xmat_F ,
mu_beta = 0,
tau_beta = 4.5
)
info_G <- list(
yobs = df1$y,
design_matrix =Xmat_G ,
mu_beta = 0,
tau_beta = 4.5
)
info_H <- list(
yobs = df1$y,
design_matrix =Xmat_H ,
mu_beta = 0,
tau_beta = 4.5
)
You will now define the log-posterior function for logistic
regression, logistic_logpost(). The first argument to
logistic_logpost() is the vector of unknowns and the second
argument is the list of required information. You will assume that the
variables within the my_info list are those contained in
the info_A through info_H lists you defined
previously.
Complete the code chunk to define the
logistic_logpost() function. The comments describe what you
need to fill in. Do you need to separate out the \(\boldsymbol{\beta}\)-parameters from the
vector of unknowns?
After you complete logistic_logpost(), test it
by setting the unknowns vector to be a vector of -1’s and
then 2’s for the model A case (the model with a only the categorical
input). If you have successfully programmed the function you should get
-164.6906 and -541.6987 for the -1 test case
and +2 test case, respectively.
Do you need to separate the \(\boldsymbol{\beta}\)-parameters from the
unknowns vector?
No as there is no sigma.
logistic_logpost <- function(unknowns, my_info)
{
# extract the design matrix and assign to X
X <- my_info$design_matrix
# calculate the linear predictor
eta <- X %*% as.matrix(unknowns)
# calculate the event probability
mu <- boot::inv.logit(eta)
# evaluate the log-likelihood
log_lik <- sum(dbinom( x = my_info$y, size =1 , prob = mu, log = TRUE))
# evaluate the log-prior
log_prior <- sum(dnorm(x = unknowns,mean = my_info$mu_beta, sd = my_info$tau_beta, log = TRUE))
# sum together
log_lik + log_prior
}
Test out your function using the info_A information and
setting the unknowns to a vector of -1’s.
dim(Xmat_A)
## [1] 225 3
###
logistic_logpost(c(-1, -1 , -1), info_A)
## [1] -164.6906
Test out your function using the info_A information and
setting the unknowns to a vector of 2’s.
###
logistic_logpost(c(2, 2 , 2), info_A)
## [1] -541.6987
The my_laplace() function is provided to you in the code
chunk below.
my_laplace <- function(start_guess, logpost_func, ...)
{
# code adapted from the `LearnBayes`` function `laplace()`
fit <- optim(start_guess,
logpost_func,
gr = NULL,
...,
method = "BFGS",
hessian = TRUE,
control = list(fnscale = -1, maxit = 5001))
mode <- fit$par
post_var_matrix <- -solve(fit$hessian)
p <- length(mode)
int <- p/2 * log(2 * pi) + 0.5 * log(det(post_var_matrix)) + logpost_func(mode, ...)
# package all of the results into a list
list(mode = mode,
var_matrix = post_var_matrix,
log_evidence = int,
converge = ifelse(fit$convergence == 0,
"YES",
"NO"),
iter_counts = as.numeric(fit$counts[1]))
}
You will use my_laplace() to execute the Laplace
Approximation for all 8 models. You must use an initial guess of zero
for all unknowns for each model.
Perform the Laplace Approximation for all 8 models. Assign
the results to the laplace_A through laplace_H
objects. The names should be consistent with the design matrices and
lists of required information. Thus, laplace_A must
correspond to the info_A and Xmat_A
objects.
Should you be concerned that the initial guess will impact the results?
Does the initial guess matter?
yes it does.
Add the code chunks here.
dim(Xmat_C)
## [1] 225 5
rep(0, 8)
## [1] 0 0 0 0 0 0 0 0
laplace_A <- my_laplace(c(0,0,0), logistic_logpost, info_A)
laplace_B <- my_laplace(c(0,0,0), logistic_logpost, info_B)
laplace_C <- my_laplace(rep(0,ncol(info_C$design_matrix)), logistic_logpost, info_C)
laplace_D <- my_laplace(rep(0,ncol(info_D$design_matrix)), logistic_logpost, info_D)
laplace_E <- my_laplace(rep(0,ncol(info_E$design_matrix)), logistic_logpost, info_E)
laplace_F <- my_laplace(rep(0,ncol(info_F$design_matrix)), logistic_logpost, info_F)
laplace_G <- my_laplace(rep(0,ncol(info_G$design_matrix)), logistic_logpost, info_G)
laplace_H <- my_laplace(rep(0,ncol(info_H$design_matrix)), logistic_logpost, info_H)
The laplace_A object is the Bayesian version of
modA that you fit previously in Problem 2a). Let’s compare
the Bayesian result to the non-Bayesian result.
Calculate the 95% confidence interval on the regression
coefficients associated with laplace_A and dispaly the
interval bounds to the screen. Which features are statistically
significant according to the Bayesian model? Are the results consistent
with the non-Bayesian model, modA?
Yes they are.
Add your code chunks here.
laplace_A
## $mode
## [1] -0.55500941 -0.84527527 0.04408551
##
## $var_matrix
## [,1] [,2] [,3]
## [1,] 0.05782617 -0.05757377 -0.05767426
## [2,] -0.05757377 0.14570798 0.05742253
## [3,] -0.05767426 0.05742253 0.11071731
##
## $log_evidence
## [1] -145.3736
##
## $converge
## [1] "YES"
##
## $iter_counts
## [1] 18
intercept_low = laplace_A$mode[1] - 2 * sqrt(diag(laplace_A$var_matrix)[1])
intercept_low
## [1] -1.035951
intercept_up = laplace_A$mode[1] + 2 * sqrt(diag(laplace_A$var_matrix)[1])
intercept_up
## [1] -0.07406797
b2_low = laplace_A$mode[2] - 2 * sqrt(diag(laplace_A$var_matrix)[2])
b2_low
## [1] -1.60871
b2_up = laplace_A$mode[2] + 2 * sqrt(diag(laplace_A$var_matrix)[2])
b2_up
## [1] -0.08184096
b3_low = laplace_A$mode[3] - 2 * sqrt(diag(laplace_A$var_matrix)[3])
b3_low
## [1] -0.6213987
b3_up = laplace_A$mode[3] + 2 * sqrt(diag(laplace_A$var_matrix)[3])
b3_up
## [1] 0.7095697
coefplot::coefplot(modA)
You trained 8 Bayesian logistic regression models. Let’s identify the best using the Evidence based approach!
Calculate the posterior model weight associated with each of
the 8 models. Create a bar chart that shows the posterior model weight
for each model. The models should be named 'A' through
'H'.
Add your code chunks here.
denom = exp(laplace_A$log_evidence) + exp(laplace_B$log_evidence) + exp(laplace_C$log_evidence) + exp(laplace_D$log_evidence) + exp(laplace_E$log_evidence) + exp(laplace_F$log_evidence) + exp(laplace_G$log_evidence) + exp(laplace_H$log_evidence)
A = exp(laplace_A$log_evidence)/ denom
B = exp(laplace_B$log_evidence)/ denom
C = exp(laplace_C$log_evidence)/ denom
D = exp(laplace_D$log_evidence)/ denom
E = exp(laplace_E$log_evidence)/ denom
F_ = exp(laplace_F$log_evidence)/ denom
G = exp(laplace_G$log_evidence)/ denom
H = exp(laplace_H$log_evidence)/ denom
A
## [1] 0.004302155
B
## [1] 0.08281358
C
## [1] 0.05530455
D
## [1] 0.3693115
E
## [1] 0.001955291
F_
## [1] 0.006310524
G
## [1] 0.07042938
H
## [1] 0.409573
ggplot() +
geom_bar(aes(x=c('A','B','C','D','E','F','G','H'),y=c(A,B,C,D,E,F_,G,H)),stat="identity")
Is your Bayesian identified best model consistent with the non-Bayesian identified best model?
What do you think?
No it is not. Previously the best model was G and now it is H. D is second best in both cases.
You trained multiple models ranging from simple to complex. You
identified the best model using several approaches. It is now time to
examine the predictive trends of the models to better interpret their
behavior. You will not predict the training set to study the trends.
Instead, you will visualize the trends on a specifically designed
prediction grid. The code chunk below defines that grid for you using
the expand.grid() function. If you look closely, the
x3 variable has 51 evenly spaced points between -3 and 3.
The x1 variable has 9 unique values evenly spaced between
-3 and 3. These lower and upper bounds are roughly consistent with the
training set bounds. The x1 variable consists of the 3
unique values present in the training set. The
expand.grid() function creates the full-factorial
combination of the 3 variables.
viz_grid <- expand.grid(x1 = unique(df1$x1),
x2 = seq(-3, 3, length.out = 9),
x3 = seq(-3, 3, length.out = 51),
KEEP.OUT.ATTRS = FALSE,
stringsAsFactors = FALSE) %>%
as.data.frame() %>% tibble::as_tibble()
viz_grid %>% glimpse()
## Rows: 1,377
## Columns: 3
## $ x1 <chr> "C", "B", "A", "C", "B", "A", "C", "B", "A", "C", "B", "A", "C", "B…
## $ x2 <dbl> -3.00, -3.00, -3.00, -2.25, -2.25, -2.25, -1.50, -1.50, -1.50, -0.7…
## $ x3 <dbl> -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3,…
The glimpse provided above shows there are 1377 combinations of the 3 inputs. You will therefore make over 1300 predictions to study the trends of the event probability!
As with previous linear model assignments, you must first generate
random posterior samples of the unknown parameters from the Laplace
Approximation assumed Multivariate Normal (MVN) distribution. Although
you were able to apply the my_laplace() function to both
the regression and logistic regression settings, you cannot directly
apply the generate_lm_post_samples() function from your
previous assignments. You will therefore adapt
generate_lm_post_samples() and define
generate_glm_post_samples(). The code chunk below starts
the function for you and uses just two input arguments,
mvn_result and num_samples. You must complete
the function.
Why can you not directly use the
generate_lm_post_samples() function? Since the
length_beta argument is NOT provided to
generate_glm_post_samples(), how can you determine the
number of \(\boldsymbol{\beta}\)-parameters? Complete
the code chunk below by first assigning the number of \(\boldsymbol{\beta}\)-parameters to the
length_beta variable. Then generate the random samples from
the MVN distribution. You do not have to name the variables, you only
need to call the correct random number generator.
What do you think? Why do we need a new function compared to the previous assignments?
As we’re dealing with categorical data we need a new function.
generate_glm_post_samples <- function(mvn_result, num_samples)
{
# specify the number of unknown beta parameters
length_beta <- length(mvn_result$mode)
# generate the random samples
beta_samples <- MASS::mvrnorm(n = num_samples,
mu = mvn_result$mode,
Sigma = mvn_result$var_matrix)
# change the data type and name
beta_samples %>%
as.data.frame() %>% tibble::as_tibble() %>%
purrr::set_names(sprintf("beta_%02d", (1:length_beta) - 1))
}
You will now define a function which calculates the posterior samples
on the linear predictor and the event probability. The function,
post_logistic_pred_samples() is started for you in the code
chunk below. It consists of two input arguments Xnew and
Bmat. Xnew is a test design matrix where rows
correspond to prediction points. The matrix Bmat stores the
posterior samples on the \(\boldsymbol{\beta}\)-parameters, where each
row is a posterior sample and each column is a parameter.
Complete the code chunk below by using matrix math to calculate the posterior samples of the linear predictor. Then, calculate the posterior smaples of the event probability.
The eta_mat and mu_mat matrices are
returned within a list, similar to how the Umat and
Ymat matrices were returned for the regression
problems.
HINT: The boot::inv.logit() can take a matrix
as an input. When it does, it returns a matrix as a result.
post_logistic_pred_samples <- function(Xnew, Bmat)
{
# calculate the linear predictor at all prediction points and posterior samples
eta_mat <- Xnew%*%t(Bmat)
# calculate the event probability
mu_mat <- boot::inv.logit(eta_mat)
# book keeping
list(eta_mat = eta_mat, mu_mat = mu_mat)
}
The code chunk below defines a function
summarize_logistic_pred_from_laplace() which manages the
actions necessary to summarize posterior predictions of the event
probability. The first argument, mvn_result, is the Laplace
Approximation object. The second object is the test design matrix,
Xtest, and the third argument, num_samples, is
the number of posterior samples to make. You must follow the comments
within the function in order to generate posterior prediction samples of
the linear predictor and the event probability, and then to summarize
the posterior predictions of the event probability.
The result from summarize_logistic_pred_from_laplace()
summarizes the posterior predicted event probability with the posterior
mean, as well as the 0.05 and 0.95 Quantiles. If you have completed the
post_logistic_pred_samples() function correctly, the
dimensions of the mu_mat matrix should be consistent with
those from the Umat matrix from the regression
problems.
The posterior summary statistics summarize the posterior samples. You
must therefore choose between colMeans() and
rowMeans() as to how to calculate the posterior mean event
probability for each prediction point. The posterior Quantiles are
calculated for you.
Follow the comments in the code chunk below to complete the
definition of the summarize_logistic_pred_from_laplace()
function. You must generate posterior samples, make posterior
predictions, and then summarize the posterior predictions of the event
probability.
HINT: The result from
post_logistic_pred_samples() is a list.
summarize_logistic_pred_from_laplace <- function(mvn_result, Xtest, num_samples)
{
# generate posterior samples of the beta parameters
betas <- generate_glm_post_samples(mvn_result, num_samples)
# data type conversion
betas <- as.matrix(betas)
# make posterior predictions on the test set
pred_test <- post_logistic_pred_samples(Xtest, betas)
# calculate summary statistics on the posterior predicted probability
# summarize over the posterior samples
# posterior mean, should you summarize along rows (rowMeans) or
# summarize down columns (colMeans) ???
mu_avg <- rowMeans(pred_test$mu_mat)
# posterior quantiles
mu_q05 <- apply(pred_test$mu_mat, 1, stats::quantile, probs = 0.05)
mu_q95 <- apply(pred_test$mu_mat, 1, stats::quantile, probs = 0.95)
# book keeping
tibble::tibble(
mu_avg = mu_avg,
mu_q05 = mu_q05,
mu_q95 = mu_q95
) %>%
tibble::rowid_to_column("pred_id")
}
You will not make predictions from all 8 models that you previously trained. Instead, you will focus on model D, model G, and model H.
You must define the vizualization grid design matrices
consistent with the model D, model G, and model H formulas. You must
name the design matrices Xviz_D, Xviz_G, and
Xviz_H. You must create the design matrices using the
viz_grid dataframe which was defined at the start of
Problem 04.
Add your code chunks here.
Xviz_D <- model.matrix(~ x1:x3 + x1:x2, data = viz_grid)
Xviz_G <- model.matrix(~ x1 + x2 + x3 + (x2:x3) +(I(x2^2):I(x3^2)), data = viz_grid)
Xviz_H <- model.matrix( ~ x1:(x2 + x3) + x1:(x2:x3) + x1:(I(x2^2) : I(x3^2)), data = viz_grid)
Summarize the posterior predicted event probability associated with the three models on the visualization grid. After making the predictions, a code chunk is provided for you which generates a figure showing how the posterior predicted probability summaries compare with the observed binary outcomes. Which of the three models appear to better capture the trends in the binary outcome?
Call summarize_logistic_pred_from_laplace() for
the all three models on the visualization grid. The object names specify
which model you should make predictions with. For example,
post_pred_summary_D corresponds to the predictions
associated with model D. Specify the number of posterior samples to be
2500. Print the dimensions of the resulting objects to the screen. How
many rows are in each data set?
The prediction summarizes should be executed in the code chunk below.
set.seed(8123)
#mvn_result, Xtest, num_samples
post_pred_summary_D <- summarize_logistic_pred_from_laplace(laplace_D, Xmat_D,2500)
post_pred_summary_G <- summarize_logistic_pred_from_laplace(laplace_G, Xmat_G,2500)
post_pred_summary_H <- summarize_logistic_pred_from_laplace(laplace_H, Xmat_H,2500)
Print the dimensions of the objects to the screen.
###
dim(post_pred_summary_D)
## [1] 225 4
dim(post_pred_summary_G)
## [1] 225 4
dim(post_pred_summary_H)
## [1] 225 4
The code chunk below defines a function for you. The function creates
a figure which visualizes the posterior predictive summary statistics of
the event probability for a single model. The figure is created to focus
on the trend with respect to x3. Facets are used to examine
the influence of x2. The line color and ribbon aesthetics
are used to denote the categorical variable x1. This figure
is specific to the three variable names in this assignment, but it shows
the basic layout required for visualizing predictive trends from
Bayesian logistic regression models with 3 inputs.
viz_bayes_logpost_preds <- function(post_pred_summary, input_df)
{
post_pred_summary %>%
left_join(input_df %>% tibble::rowid_to_column('pred_id'),
by = 'pred_id') %>%
ggplot(mapping = aes(x = x3)) +
geom_ribbon(mapping = aes(ymin = mu_q05,
ymax = mu_q95,
group = interaction(x1, x2),
fill = x1),
alpha = 0.25) +
geom_line(mapping = aes(y = mu_avg,
group = interaction(x1, x2),
color = x1),
size = 1.15) +
facet_wrap( ~ x2, labeller = 'label_both') +
labs(y = "event probability") +
theme_bw()
}
Use the viz_bayes_logpost_preds() function to
visualize posterior predictive trends of the event probability for the 3
models: model D, model G, and model H.
Add your code chunks here.
viz_bayes_logpost_preds(post_pred_summary_D,viz_grid)
viz_bayes_logpost_preds(post_pred_summary_G,viz_grid)
viz_bayes_logpost_preds(post_pred_summary_H,viz_grid)
Describe the differences in the predictive trends between the 3 models?
What do you think?
The graphs for model D seem to model the predictive trends well. They are close to each other as compared to model G and H.
You should have noticed a pattern associated with the 8 models that you previously fit. The most complex model, model H, contains all other models! It is a super set of all features from the simpler models. An alternative approach to training many models of varying complexity is to train a single complex model and use regularization to “turn off” the unimportant features. This way we can find out if the most complex model can be turned into a simpler model of just the most key features we need!
We discussed in lecture how the Lasso penalty or its Bayesian analog
the Double-Exponential prior are capable of turning off the unimportant
features. We focused on regression problems but Lasso can also be
applied to classification problems! In this problem you will use
caret to manage the training, assessment, and tuning of the
glmnet elastic net penalized logistic regression model. The
code chunk below imports the caret package for you.
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
The caret package prefers the binary outcome to be
organized as a factor data type compared to an integer. The data set is
reformatted for you in the code chunk below. The binary outcome
y is converted to a new variable outcome with
values 'event' and 'non_event'. The first
level is forced to be 'event' to be consistent with the
caret preferred structure.
df_caret <- df1 %>%
mutate(outcome = ifelse(y == 1, 'event', 'non_event')) %>%
mutate(outcome = factor(outcome, levels = c("event", "non_event"))) %>%
select(x1, x2, x3, outcome)
df_caret %>% glimpse()
## Rows: 225
## Columns: 4
## $ x1 <chr> "C", "B", "C", "B", "A", "C", "A", "A", "C", "C", "B", "C", "A…
## $ x2 <dbl> 1.682873632, -1.033648456, 0.110854156, 2.032934019, -0.225540…
## $ x3 <dbl> -0.353085685, -0.778102544, 0.757536960, 0.639465847, 0.017483…
## $ outcome <fct> event, non_event, non_event, non_event, non_event, event, even…
You must specify the resampling scheme that caret will use to train,
assess, and tune a model. You used caret in the previous
assignment for a regression problem. Here, you are working with a
classification problem and so you cannot use the same performance metric
as the previous assignment! Although there are multiple classification
metrics we could consider, we will focus on Accuracy in this
problem.
Specify the resampling scheme to be 10 fold with 3 repeats.
Assign the result of the trainControl() function to the
my_ctrl object. Specify the primary performance metric to
be 'Accuracy' and assign that to the my_metric
object.
Add your code chunks here.
my_ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3 )
my_metric="Accuracy"
You must train, assess, and tune an elastic model using the default
caret tuning grid. In the caret::train()
function you must use the formula interface to specify a model
consistent with model H. Thus, your model should interact the
categorical input to the linear main continuous effects, interaction
between continuous, and quadratic continuous features. However, please
pay close attention to your formula. The binary outcome is now named
outcome and not y. Assign the
method argument to 'glmnet' and set the metric argument to
my_metric. Even though the inputs were standardized for
you, you must also instruct caret to
standardize the features by setting the preProcess argument
equal to c('center', 'scale'). This will give you practice
standardizing inputs. Assign the trControl argument to the
my_ctrl object.
Important: The caret::train() function
works with the formula interface. Thus, even though you are using
glmnet to fit the model, caret does not
require you to organize the design matrix as required by
glmnet! Thus, you do NOT need to remove
the intercept when defining the formula to
caret::train()!
Train, assess, and tune the glmnet elastic net
model consistent with model H with the defined resampling scheme. Assign
the result to the enet_default object and display the
result to the screen.
Which tuning parameter combinations are considered to be the best?
Is the best set of tuning parameters more consistent with Lasso or Ridge regression?
It is more consistent with lasso as alpha = 1.
The random seed is set for you for reproducibility.
set.seed(1234)
enet_default <- caret::train(outcome ~ ((x2+x3+x2*x3+I(x2^2)+I(x3^2))*x1),
data = df_caret ,
method='glmnet',
metric = "Accuracy",
preProcess=c("center", "scale"),
trControl = my_ctrl)
enet_default
## glmnet
##
## 225 samples
## 3 predictor
## 2 classes: 'event', 'non_event'
##
## Pre-processing: centered (17), scaled (17)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 202, 202, 202, 203, 202, 202, ...
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0004769061 0.8251372 0.5805807
## 0.10 0.0047690605 0.8282993 0.5802313
## 0.10 0.0476906052 0.8072793 0.4935810
## 0.55 0.0004769061 0.8251372 0.5805807
## 0.55 0.0047690605 0.8298748 0.5824205
## 0.55 0.0476906052 0.8042545 0.4773741
## 1.00 0.0004769061 0.8207235 0.5714822
## 1.00 0.0047690605 0.8328393 0.5904347
## 1.00 0.0476906052 0.8147892 0.5081578
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 1 and lambda = 0.004769061.
Create a custom tuning grid to further tune the elastic net
lambda and alpha tuning parameters.
Create a tuning grid with the expand.grid()
function which has two columns named alpha and
lambda. The alpha variable should be evenly
spaced between 0.1 and 1.0 by increments of 0.1. The lambda
variable should have 25 evenly spaced values in the log-space between
the minimum and maximum lambda values from the caret
default tuning grid. Assign the tuning grid to the
enet_grid object.
How many tuning parameter combinations are you trying out? How many total models will be fit assuming the 5-fold with 3-repeat resampling approach?
HINT: The seq() function includes an argument
by to specify the increment width.
Add your code chunks here.
How many?
We are trying out 250 combinations and we will be fitting 15 models.
enet_grid <- expand.grid(alpha = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0),
lambda = exp(seq(log(0.0004769061),log(0.0476906052),length.out=25)) )
enet_grid
## alpha lambda
## 1 0.1 0.0004769061
## 2 0.2 0.0004769061
## 3 0.3 0.0004769061
## 4 0.4 0.0004769061
## 5 0.5 0.0004769061
## 6 0.6 0.0004769061
## 7 0.7 0.0004769061
## 8 0.8 0.0004769061
## 9 0.9 0.0004769061
## 10 1.0 0.0004769061
## 11 0.1 0.0005777849
## 12 0.2 0.0005777849
## 13 0.3 0.0005777849
## 14 0.4 0.0005777849
## 15 0.5 0.0005777849
## 16 0.6 0.0005777849
## 17 0.7 0.0005777849
## 18 0.8 0.0005777849
## 19 0.9 0.0005777849
## 20 1.0 0.0005777849
## 21 0.1 0.0007000024
## 22 0.2 0.0007000024
## 23 0.3 0.0007000024
## 24 0.4 0.0007000024
## 25 0.5 0.0007000024
## 26 0.6 0.0007000024
## 27 0.7 0.0007000024
## 28 0.8 0.0007000024
## 29 0.9 0.0007000024
## 30 1.0 0.0007000024
## 31 0.1 0.0008480723
## 32 0.2 0.0008480723
## 33 0.3 0.0008480723
## 34 0.4 0.0008480723
## 35 0.5 0.0008480723
## 36 0.6 0.0008480723
## 37 0.7 0.0008480723
## 38 0.8 0.0008480723
## 39 0.9 0.0008480723
## 40 1.0 0.0008480723
## 41 0.1 0.0010274630
## 42 0.2 0.0010274630
## 43 0.3 0.0010274630
## 44 0.4 0.0010274630
## 45 0.5 0.0010274630
## 46 0.6 0.0010274630
## 47 0.7 0.0010274630
## 48 0.8 0.0010274630
## 49 0.9 0.0010274630
## 50 1.0 0.0010274630
## 51 0.1 0.0012447999
## 52 0.2 0.0012447999
## 53 0.3 0.0012447999
## 54 0.4 0.0012447999
## 55 0.5 0.0012447999
## 56 0.6 0.0012447999
## 57 0.7 0.0012447999
## 58 0.8 0.0012447999
## 59 0.9 0.0012447999
## 60 1.0 0.0012447999
## 61 0.1 0.0015081095
## 62 0.2 0.0015081095
## 63 0.3 0.0015081095
## 64 0.4 0.0015081095
## 65 0.5 0.0015081095
## 66 0.6 0.0015081095
## 67 0.7 0.0015081095
## 68 0.8 0.0015081095
## 69 0.9 0.0015081095
## 70 1.0 0.0015081095
## 71 0.1 0.0018271163
## 72 0.2 0.0018271163
## 73 0.3 0.0018271163
## 74 0.4 0.0018271163
## 75 0.5 0.0018271163
## 76 0.6 0.0018271163
## 77 0.7 0.0018271163
## 78 0.8 0.0018271163
## 79 0.9 0.0018271163
## 80 1.0 0.0018271163
## 81 0.1 0.0022136020
## 82 0.2 0.0022136020
## 83 0.3 0.0022136020
## 84 0.4 0.0022136020
## 85 0.5 0.0022136020
## 86 0.6 0.0022136020
## 87 0.7 0.0022136020
## 88 0.8 0.0022136020
## 89 0.9 0.0022136020
## 90 1.0 0.0022136020
## 91 0.1 0.0026818400
## 92 0.2 0.0026818400
## 93 0.3 0.0026818400
## 94 0.4 0.0026818400
## 95 0.5 0.0026818400
## 96 0.6 0.0026818400
## 97 0.7 0.0026818400
## 98 0.8 0.0026818400
## 99 0.9 0.0026818400
## 100 1.0 0.0026818400
## 101 0.1 0.0032491233
## 102 0.2 0.0032491233
## 103 0.3 0.0032491233
## 104 0.4 0.0032491233
## 105 0.5 0.0032491233
## 106 0.6 0.0032491233
## 107 0.7 0.0032491233
## 108 0.8 0.0032491233
## 109 0.9 0.0032491233
## 110 1.0 0.0032491233
## 111 0.1 0.0039364027
## 112 0.2 0.0039364027
## 113 0.3 0.0039364027
## 114 0.4 0.0039364027
## 115 0.5 0.0039364027
## 116 0.6 0.0039364027
## 117 0.7 0.0039364027
## 118 0.8 0.0039364027
## 119 0.9 0.0039364027
## 120 1.0 0.0039364027
## 121 0.1 0.0047690608
## 122 0.2 0.0047690608
## 123 0.3 0.0047690608
## 124 0.4 0.0047690608
## 125 0.5 0.0047690608
## 126 0.6 0.0047690608
## 127 0.7 0.0047690608
## 128 0.8 0.0047690608
## 129 0.9 0.0047690608
## 130 1.0 0.0047690608
## 131 0.1 0.0057778490
## 132 0.2 0.0057778490
## 133 0.3 0.0057778490
## 134 0.4 0.0057778490
## 135 0.5 0.0057778490
## 136 0.6 0.0057778490
## 137 0.7 0.0057778490
## 138 0.8 0.0057778490
## 139 0.9 0.0057778490
## 140 1.0 0.0057778490
## 141 0.1 0.0070000238
## 142 0.2 0.0070000238
## 143 0.3 0.0070000238
## 144 0.4 0.0070000238
## 145 0.5 0.0070000238
## 146 0.6 0.0070000238
## 147 0.7 0.0070000238
## 148 0.8 0.0070000238
## 149 0.9 0.0070000238
## 150 1.0 0.0070000238
## 151 0.1 0.0084807224
## 152 0.2 0.0084807224
## 153 0.3 0.0084807224
## 154 0.4 0.0084807224
## 155 0.5 0.0084807224
## 156 0.6 0.0084807224
## 157 0.7 0.0084807224
## 158 0.8 0.0084807224
## 159 0.9 0.0084807224
## 160 1.0 0.0084807224
## 161 0.1 0.0102746298
## 162 0.2 0.0102746298
## 163 0.3 0.0102746298
## 164 0.4 0.0102746298
## 165 0.5 0.0102746298
## 166 0.6 0.0102746298
## 167 0.7 0.0102746298
## 168 0.8 0.0102746298
## 169 0.9 0.0102746298
## 170 1.0 0.0102746298
## 171 0.1 0.0124479981
## 172 0.2 0.0124479981
## 173 0.3 0.0124479981
## 174 0.4 0.0124479981
## 175 0.5 0.0124479981
## 176 0.6 0.0124479981
## 177 0.7 0.0124479981
## 178 0.8 0.0124479981
## 179 0.9 0.0124479981
## 180 1.0 0.0124479981
## 181 0.1 0.0150810939
## 182 0.2 0.0150810939
## 183 0.3 0.0150810939
## 184 0.4 0.0150810939
## 185 0.5 0.0150810939
## 186 0.6 0.0150810939
## 187 0.7 0.0150810939
## 188 0.8 0.0150810939
## 189 0.9 0.0150810939
## 190 1.0 0.0150810939
## 191 0.1 0.0182711623
## 192 0.2 0.0182711623
## 193 0.3 0.0182711623
## 194 0.4 0.0182711623
## 195 0.5 0.0182711623
## 196 0.6 0.0182711623
## 197 0.7 0.0182711623
## 198 0.8 0.0182711623
## 199 0.9 0.0182711623
## 200 1.0 0.0182711623
## 201 0.1 0.0221360184
## 202 0.2 0.0221360184
## 203 0.3 0.0221360184
## 204 0.4 0.0221360184
## 205 0.5 0.0221360184
## 206 0.6 0.0221360184
## 207 0.7 0.0221360184
## 208 0.8 0.0221360184
## 209 0.9 0.0221360184
## 210 1.0 0.0221360184
## 211 0.1 0.0268183985
## 212 0.2 0.0268183985
## 213 0.3 0.0268183985
## 214 0.4 0.0268183985
## 215 0.5 0.0268183985
## 216 0.6 0.0268183985
## 217 0.7 0.0268183985
## 218 0.8 0.0268183985
## 219 0.9 0.0268183985
## 220 1.0 0.0268183985
## 221 0.1 0.0324912314
## 222 0.2 0.0324912314
## 223 0.3 0.0324912314
## 224 0.4 0.0324912314
## 225 0.5 0.0324912314
## 226 0.6 0.0324912314
## 227 0.7 0.0324912314
## 228 0.8 0.0324912314
## 229 0.9 0.0324912314
## 230 1.0 0.0324912314
## 231 0.1 0.0393640253
## 232 0.2 0.0393640253
## 233 0.3 0.0393640253
## 234 0.4 0.0393640253
## 235 0.5 0.0393640253
## 236 0.6 0.0393640253
## 237 0.7 0.0393640253
## 238 0.8 0.0393640253
## 239 0.9 0.0393640253
## 240 1.0 0.0393640253
## 241 0.1 0.0476906052
## 242 0.2 0.0476906052
## 243 0.3 0.0476906052
## 244 0.4 0.0476906052
## 245 0.5 0.0476906052
## 246 0.6 0.0476906052
## 247 0.7 0.0476906052
## 248 0.8 0.0476906052
## 249 0.9 0.0476906052
## 250 1.0 0.0476906052
Train, assess, and tune the elastic net model with the custom
tuning grid and assign the result to the enet_tune object.
You should specify the arguments to caret::train()
consistent with your solution in Problem 5b), except you should also
assign enet_grid to the tuneGrid
argument.
Do not print the result to the screen. Instead use the
default plot method to visualize the resampling results. Assign the
xTrans argument to log in the default plot
method call. Use the $bestTune field to print the
identified best tuning parameter values to the screen. Is the identified
best elastic net model more similar to Lasso or Ridge
regression?
The random seed is set for you for reproducibility. You may add more code chunks if you like.
set.seed(1234)
enet_caret_tune <- train(outcome~((x2+x3+x2*x3+I(x2^2)+I(x3^2))*x1),
data = df_caret ,
method='glmnet',
metric = 'Accuracy',
preProcess=c("center", "scale"),
trControl = my_ctrl,
tuneGrid=enet_grid,
xTrans = "log" )
enet_caret_tune
## glmnet
##
## 225 samples
## 3 predictor
## 2 classes: 'event', 'non_event'
##
## Pre-processing: centered (17), scaled (17)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 202, 202, 202, 203, 202, 202, ...
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.1 0.0004769061 0.8251372 0.5805807
## 0.1 0.0005777849 0.8251372 0.5805807
## 0.1 0.0007000024 0.8236880 0.5775879
## 0.1 0.0008480723 0.8236880 0.5775879
## 0.1 0.0010274630 0.8236880 0.5775879
## 0.1 0.0012447999 0.8252031 0.5809416
## 0.1 0.0015081095 0.8252031 0.5809416
## 0.1 0.0018271163 0.8252031 0.5809416
## 0.1 0.0022136020 0.8267183 0.5834099
## 0.1 0.0026818400 0.8282334 0.5862975
## 0.1 0.0032491233 0.8311978 0.5916436
## 0.1 0.0039364027 0.8312637 0.5897212
## 0.1 0.0047690608 0.8282993 0.5802313
## 0.1 0.0057778490 0.8311978 0.5838589
## 0.1 0.0070000238 0.8342282 0.5897873
## 0.1 0.0084807224 0.8371267 0.5939053
## 0.1 0.0102746298 0.8340964 0.5842945
## 0.1 0.0124479981 0.8312527 0.5767602
## 0.1 0.0150810939 0.8222936 0.5528782
## 0.1 0.0182711623 0.8253239 0.5584803
## 0.1 0.0221360184 0.8193292 0.5393024
## 0.1 0.0268183985 0.8192029 0.5358124
## 0.1 0.0324912314 0.8103096 0.5072884
## 0.1 0.0393640253 0.8072793 0.4969990
## 0.1 0.0476906052 0.8072793 0.4935810
## 0.2 0.0004769061 0.8251372 0.5805807
## 0.2 0.0005777849 0.8251372 0.5805807
## 0.2 0.0007000024 0.8236880 0.5775879
## 0.2 0.0008480723 0.8236880 0.5775879
## 0.2 0.0010274630 0.8236880 0.5775879
## 0.2 0.0012447999 0.8252031 0.5809416
## 0.2 0.0015081095 0.8252031 0.5809416
## 0.2 0.0018271163 0.8252031 0.5809416
## 0.2 0.0022136020 0.8267183 0.5834099
## 0.2 0.0026818400 0.8282334 0.5862975
## 0.2 0.0032491233 0.8297486 0.5887009
## 0.2 0.0039364027 0.8311978 0.5893749
## 0.2 0.0047690608 0.8312637 0.5864579
## 0.2 0.0057778490 0.8357433 0.5953792
## 0.2 0.0070000238 0.8298090 0.5775383
## 0.2 0.0084807224 0.8357378 0.5897792
## 0.2 0.0102746298 0.8357378 0.5897792
## 0.2 0.0124479981 0.8311265 0.5764073
## 0.2 0.0150810939 0.8253239 0.5584803
## 0.2 0.0182711623 0.8222936 0.5492583
## 0.2 0.0221360184 0.8133344 0.5214589
## 0.2 0.0268183985 0.8089207 0.5071804
## 0.2 0.0324912314 0.8087286 0.5026568
## 0.2 0.0393640253 0.8072793 0.4969990
## 0.2 0.0476906052 0.8072793 0.4930741
## 0.3 0.0004769061 0.8251372 0.5805807
## 0.3 0.0005777849 0.8251372 0.5805807
## 0.3 0.0007000024 0.8236880 0.5775879
## 0.3 0.0008480723 0.8236880 0.5775879
## 0.3 0.0010274630 0.8236880 0.5775879
## 0.3 0.0012447999 0.8236880 0.5775879
## 0.3 0.0015081095 0.8252031 0.5809416
## 0.3 0.0018271163 0.8252031 0.5809416
## 0.3 0.0022136020 0.8267183 0.5834099
## 0.3 0.0026818400 0.8297486 0.5894051
## 0.3 0.0032491233 0.8296827 0.5872487
## 0.3 0.0039364027 0.8341623 0.5956015
## 0.3 0.0047690608 0.8312637 0.5864579
## 0.3 0.0057778490 0.8328393 0.5884904
## 0.3 0.0070000238 0.8327734 0.5832076
## 0.3 0.0084807224 0.8357378 0.5897792
## 0.3 0.0102746298 0.8371871 0.5929257
## 0.3 0.0124479981 0.8282279 0.5675410
## 0.3 0.0150810939 0.8238087 0.5538513
## 0.3 0.0182711623 0.8178140 0.5360502
## 0.3 0.0221360184 0.8134003 0.5221846
## 0.3 0.0268183985 0.8058904 0.4981273
## 0.3 0.0324912314 0.8074056 0.5011588
## 0.3 0.0393640253 0.8044412 0.4925041
## 0.3 0.0476906052 0.8043149 0.4837305
## 0.4 0.0004769061 0.8251372 0.5805807
## 0.4 0.0005777849 0.8251372 0.5805807
## 0.4 0.0007000024 0.8236880 0.5775879
## 0.4 0.0008480723 0.8236880 0.5775879
## 0.4 0.0010274630 0.8236880 0.5775879
## 0.4 0.0012447999 0.8236880 0.5775879
## 0.4 0.0015081095 0.8252031 0.5809416
## 0.4 0.0018271163 0.8252031 0.5809416
## 0.4 0.0022136020 0.8267183 0.5834099
## 0.4 0.0026818400 0.8267183 0.5815994
## 0.4 0.0032491233 0.8311978 0.5898982
## 0.4 0.0039364027 0.8341623 0.5956015
## 0.4 0.0047690608 0.8313241 0.5869923
## 0.4 0.0057778490 0.8313900 0.5829001
## 0.4 0.0070000238 0.8342885 0.5887994
## 0.4 0.0084807224 0.8357378 0.5897792
## 0.4 0.0102746298 0.8296772 0.5712391
## 0.4 0.0124479981 0.8297431 0.5696743
## 0.4 0.0150810939 0.8222332 0.5495631
## 0.4 0.0182711623 0.8162989 0.5318667
## 0.4 0.0221360184 0.8118852 0.5171788
## 0.4 0.0268183985 0.8074056 0.5037191
## 0.4 0.0324912314 0.8074056 0.5011588
## 0.4 0.0393640253 0.8043094 0.4905014
## 0.4 0.0476906052 0.8027394 0.4761559
## 0.5 0.0004769061 0.8251372 0.5805807
## 0.5 0.0005777849 0.8236880 0.5775879
## 0.5 0.0007000024 0.8236880 0.5775879
## 0.5 0.0008480723 0.8236880 0.5775879
## 0.5 0.0010274630 0.8236880 0.5775879
## 0.5 0.0012447999 0.8236880 0.5775879
## 0.5 0.0015081095 0.8252031 0.5809416
## 0.5 0.0018271163 0.8252031 0.5809416
## 0.5 0.0022136020 0.8236880 0.5760235
## 0.5 0.0026818400 0.8267183 0.5815994
## 0.5 0.0032491233 0.8340964 0.5958337
## 0.5 0.0039364027 0.8313241 0.5884088
## 0.5 0.0047690608 0.8313241 0.5869923
## 0.5 0.0057778490 0.8328393 0.5858066
## 0.5 0.0070000238 0.8342885 0.5887994
## 0.5 0.0084807224 0.8357378 0.5903653
## 0.5 0.0102746298 0.8281621 0.5670099
## 0.5 0.0124479981 0.8252635 0.5577221
## 0.5 0.0150810939 0.8192029 0.5402903
## 0.5 0.0182711623 0.8192688 0.5391357
## 0.5 0.0221360184 0.8132740 0.5215551
## 0.5 0.0268183985 0.8118852 0.5157502
## 0.5 0.0324912314 0.8043753 0.4933870
## 0.5 0.0393640253 0.8058904 0.4908677
## 0.5 0.0476906052 0.8056379 0.4807361
## 0.6 0.0004769061 0.8251372 0.5805807
## 0.6 0.0005777849 0.8236880 0.5775879
## 0.6 0.0007000024 0.8236880 0.5775879
## 0.6 0.0008480723 0.8236880 0.5775879
## 0.6 0.0010274630 0.8236880 0.5775879
## 0.6 0.0012447999 0.8236880 0.5775879
## 0.6 0.0015081095 0.8252031 0.5809416
## 0.6 0.0018271163 0.8236880 0.5760235
## 0.6 0.0022136020 0.8236880 0.5760235
## 0.6 0.0026818400 0.8267183 0.5815994
## 0.6 0.0032491233 0.8340964 0.5961800
## 0.6 0.0039364027 0.8298090 0.5841409
## 0.6 0.0047690608 0.8298748 0.5824205
## 0.6 0.0057778490 0.8343544 0.5912836
## 0.6 0.0070000238 0.8342227 0.5870729
## 0.6 0.0084807224 0.8327075 0.5814639
## 0.6 0.0102746298 0.8281621 0.5662080
## 0.6 0.0124479981 0.8237484 0.5537923
## 0.6 0.0150810939 0.8192029 0.5402903
## 0.6 0.0182711623 0.8178195 0.5352012
## 0.6 0.0221360184 0.8163043 0.5307411
## 0.6 0.0268183985 0.8147892 0.5239878
## 0.6 0.0324912314 0.8118852 0.5125886
## 0.6 0.0393640253 0.8087945 0.4981802
## 0.6 0.0476906052 0.8058355 0.4812085
## 0.7 0.0004769061 0.8236880 0.5773264
## 0.7 0.0005777849 0.8236880 0.5775879
## 0.7 0.0007000024 0.8236880 0.5775879
## 0.7 0.0008480723 0.8236880 0.5775879
## 0.7 0.0010274630 0.8236880 0.5775879
## 0.7 0.0012447999 0.8236880 0.5775879
## 0.7 0.0015081095 0.8236880 0.5775879
## 0.7 0.0018271163 0.8236880 0.5760235
## 0.7 0.0022136020 0.8236880 0.5760235
## 0.7 0.0026818400 0.8281675 0.5845922
## 0.7 0.0032491233 0.8327075 0.5935590
## 0.7 0.0039364027 0.8313241 0.5869923
## 0.7 0.0047690608 0.8313241 0.5853270
## 0.7 0.0057778490 0.8343544 0.5912836
## 0.7 0.0070000238 0.8342227 0.5870729
## 0.7 0.0084807224 0.8311924 0.5774456
## 0.7 0.0102746298 0.8267128 0.5620190
## 0.7 0.0124479981 0.8221673 0.5490776
## 0.7 0.0150810939 0.8192029 0.5402903
## 0.7 0.0182711623 0.8193347 0.5393848
## 0.7 0.0221360184 0.8178195 0.5349246
## 0.7 0.0268183985 0.8147892 0.5246382
## 0.7 0.0324912314 0.8103700 0.5045990
## 0.7 0.0393640253 0.8073452 0.4934681
## 0.7 0.0476906052 0.8074111 0.4850804
## 0.8 0.0004769061 0.8236880 0.5773264
## 0.8 0.0005777849 0.8222387 0.5743336
## 0.8 0.0007000024 0.8236880 0.5775879
## 0.8 0.0008480723 0.8236880 0.5775879
## 0.8 0.0010274630 0.8236880 0.5775879
## 0.8 0.0012447999 0.8236880 0.5775879
## 0.8 0.0015081095 0.8236880 0.5775879
## 0.8 0.0018271163 0.8236880 0.5760235
## 0.8 0.0022136020 0.8236880 0.5760235
## 0.8 0.0026818400 0.8311978 0.5904755
## 0.8 0.0032491233 0.8342227 0.5984772
## 0.8 0.0039364027 0.8298748 0.5844510
## 0.8 0.0047690608 0.8328393 0.5902451
## 0.8 0.0057778490 0.8328393 0.5862778
## 0.8 0.0070000238 0.8312582 0.5792087
## 0.8 0.0084807224 0.8296772 0.5709783
## 0.8 0.0102746298 0.8236825 0.5544588
## 0.8 0.0124479981 0.8236825 0.5524011
## 0.8 0.0150810939 0.8236825 0.5516347
## 0.8 0.0182711623 0.8207839 0.5434684
## 0.8 0.0221360184 0.8192688 0.5372574
## 0.8 0.0268183985 0.8163043 0.5271747
## 0.8 0.0324912314 0.8162385 0.5222259
## 0.8 0.0393640253 0.8147233 0.5104960
## 0.8 0.0476906052 0.8103755 0.4943688
## 0.9 0.0004769061 0.8222387 0.5743336
## 0.9 0.0005777849 0.8222387 0.5743336
## 0.9 0.0007000024 0.8222387 0.5743336
## 0.9 0.0008480723 0.8222387 0.5743336
## 0.9 0.0010274630 0.8222387 0.5743336
## 0.9 0.0012447999 0.8236880 0.5775879
## 0.9 0.0015081095 0.8236880 0.5775879
## 0.9 0.0018271163 0.8236880 0.5760235
## 0.9 0.0022136020 0.8251372 0.5790163
## 0.9 0.0026818400 0.8327734 0.5955344
## 0.9 0.0032491233 0.8342227 0.5984772
## 0.9 0.0039364027 0.8313241 0.5873575
## 0.9 0.0047690608 0.8328393 0.5904347
## 0.9 0.0057778490 0.8343544 0.5912836
## 0.9 0.0070000238 0.8297431 0.5750251
## 0.9 0.0084807224 0.8296772 0.5709783
## 0.9 0.0102746298 0.8281621 0.5669045
## 0.9 0.0124479981 0.8236166 0.5534532
## 0.9 0.0150810939 0.8236825 0.5516347
## 0.9 0.0182711623 0.8236825 0.5516347
## 0.9 0.0221360184 0.8207181 0.5418944
## 0.9 0.0268183985 0.8192688 0.5337835
## 0.9 0.0324912314 0.8191425 0.5297288
## 0.9 0.0393640253 0.8177536 0.5206512
## 0.9 0.0476906052 0.8132740 0.5035063
## 1.0 0.0004769061 0.8207235 0.5714822
## 1.0 0.0005777849 0.8222387 0.5743336
## 1.0 0.0007000024 0.8222387 0.5743336
## 1.0 0.0008480723 0.8222387 0.5743336
## 1.0 0.0010274630 0.8222387 0.5743336
## 1.0 0.0012447999 0.8222387 0.5743336
## 1.0 0.0015081095 0.8222387 0.5743336
## 1.0 0.0018271163 0.8252031 0.5809416
## 1.0 0.0022136020 0.8266524 0.5839344
## 1.0 0.0026818400 0.8327734 0.5955344
## 1.0 0.0032491233 0.8298090 0.5850657
## 1.0 0.0039364027 0.8313241 0.5873575
## 1.0 0.0047690608 0.8328393 0.5904347
## 1.0 0.0057778490 0.8343544 0.5912836
## 1.0 0.0070000238 0.8312582 0.5781756
## 1.0 0.0084807224 0.8356719 0.5864599
## 1.0 0.0102746298 0.8296772 0.5708342
## 1.0 0.0124479981 0.8236166 0.5534532
## 1.0 0.0150810939 0.8236825 0.5516347
## 1.0 0.0182711623 0.8236166 0.5518115
## 1.0 0.0221360184 0.8222332 0.5445364
## 1.0 0.0268183985 0.8192029 0.5314559
## 1.0 0.0324912314 0.8235562 0.5380663
## 1.0 0.0393640253 0.8191425 0.5224167
## 1.0 0.0476906052 0.8147892 0.5081578
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.3 and lambda = 0.01027463.
plot(enet_caret_tune)
What do you think?
It is now more consistent with Ridge
Print the coefficients to the screen for the tuned elastic net model. Which coefficients are non-zero? Has the complex model H been converted to a simpler model?
### add more code chunks if you'd like
coefplot::coefplot(enet_caret_tune$finalModel)
What do you think?
Yes it has been converted to a simpler model as we can see some coefficients have become close to 0.
Let’s now visualize the predictions of the event probability from the
tuned elastic net penalized logistic regression model. All
caret trained models make predictions with a
predict() function. The first argument is the
caret trained object and the second object,
newdata, is the new data set to make predictions with.
Earlier in the semester in homework 03, you made predictions from
caret trained binary classifiers. That assignment discussed
that the optional third argument type dictated the “type”
of prediction to make. Setting type = 'prob' instructs the
predict() function to return the class predicted
probabilities.
Complete the code chunk below. You must make predictions on
the visualization grid, viz_grid, using the tuned elastic
net model enet_tune. Instruct the predict()
function to return the probabilities by setting
type = 'prob'.
pred_viz_enet_probs <- predict(enet_caret_tune, viz_grid, type = 'prob')
The code chunk below is completed for you. The
pred_viz_enet dataframe is column binded to the
viz_grid dataframe. The new object,
viz_enet_df, provides the class predicted probabilities for
each input combination in the visualization grid. A glimpse is printed
to the screen. Please not the eval flag is set to
eval=FALSE in the code chunk below. You must change
eval to eval=TRUE in the chunk options to make
sure the code chunk below runs when you knit the markdown.
viz_enet_df <- viz_grid %>% bind_cols(pred_viz_enet_probs)
viz_enet_df %>% glimpse()
## Rows: 1,377
## Columns: 5
## $ x1 <chr> "C", "B", "A", "C", "B", "A", "C", "B", "A", "C", "B", "A", …
## $ x2 <dbl> -3.00, -3.00, -3.00, -2.25, -2.25, -2.25, -1.50, -1.50, -1.5…
## $ x3 <dbl> -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, …
## $ event <dbl> 0.9999022, 0.8864551, 0.9984169, 0.9999666, 0.9958465, 0.999…
## $ non_event <dbl> 9.783916e-05, 1.135449e-01, 1.583115e-03, 3.339000e-05, 4.15…
The glimpse reveals that the event column stores the
predicted event probability. You must visualize the
predicted event probability in a manner consistent with the
viz_bayes_logpost_preds() function. The caret
trained object does not return uncertainty estimates from the
glmnet model and so you will not include uncertainty
intervals as ribbons. You will visualize the predicted probability as a
line (curve) with respect to x3, for each combination of
x2 and x1.
Pipe the viz_enet_df object to
ggplot(). Map the x aesthetic to the
x3 variable and the y aesthetic to the
event variable. Add a geom_line() layer and
map the color aesthetic to the x1 variable.
Manually assign the size to 1.2. Create the facets by
including the facet_wrap() function and specify the facets
are “functions of” the x2 input.
Add your code chunk here.
viz_enet_df %>%
ggplot(mapping = aes(x=x3,y=event))+
geom_line(aes(color=x1 ,size=1.2))+
facet_wrap(~x2)
Are the predicted trends from the tuned elastic net model consistent with the behavior visualized by the Bayesian models?
What do you think?
Yes, It is consistent with the behavior visualized by the Bayesian Model.